Planning While Learning Operators
نویسنده
چکیده
This paper describes issues that arise when integrating a planner with a system that learns planning operators incrementally, and our approaches to address these issues. During learning, domain knowledge can be incomplete and incorrect in different ways; therefore the planner must be able to use incomplete domain knowledge. This presents the following challenges for planning: How should the planner effectively generate plans using incomplete and incorrect domain knowledge? How should the planner repair plans upon execution failures? How should planning, learning, and execution be integrated? This paper describes how we address these challenges in the framework of an integrated system, called OBSERVER, that learns planning operators automatically and incrementally. In OBSERVER, operators are learned by observing expert agents and by practicing in a learning-by-doing paradigm. We present empirical results to demonstrate the validity of our approach in a process planning domain. These results show that practicing using our algorithms for planning with incomplete information and plan repair contributes significantly to the learning process. Planning While Learning Operators Acquiring and maintaining domain knowledge is a key bottleneck in fielding planning systems (Chien et al. 1995). Our approach (Wang 1995) to address this issue is to learn planning operators automatically and incrementally by observing expert agents and to refine the operators by practicing with them in a learning-by-doing paradigm (Anzai & Simon 1979). Unfortunately, while learning, the operators can be incomplete and incorrect. The planning system must This research is sponsored by the Wright Laboratory, Aeronautical Systems Center, Air Force Materiel Command, USAF, and the Advanced Research Projects Agency (ARPA) under grant number F33615-93-1-1330. Views and conclusions contained in this documentare those of the authors and should not be interpreted as necessarily representing official policies or endorsements, either expressed or implied, of Wright Laboratory or the United States Government. Thanks to Steve Chien, Eugene Fink, Anita Govindjee, Alicia Pérez, and Henry Rowley for helpful comments, and to Jaime Carbonell, Jill Fain, Douglas Fisher, Herb Simon, and Manuela Veloso for their suggestions and support. be able to plan using these operators during practice. This challenges planning in the following ways: 1. Classical planners presume a correct domain model. In our learning system however, the newly acquired operators are possibly incomplete and incorrect. How can the planner generate plans to solve practice problems? 2. Because of incomplete and incorrect operators used during practice, the plans generated by the planner may be incorrect, which in turn may lead to execution failures. Thus plan repair upon execution failure is necessary. How can the planner effectively repair the incorrect plan using incomplete and incorrect operators? 3. How should planning and execution be interleaved so that the system can solve practice problems effectively and concurrently generate learning opportunities for refining operators using incomplete and incorrect domain knowledge? This paper describes how we address these challenges in the context of OBSERVER (Wang 1995), a system for automatic acquisition of planning operators by observation and practice. We first present OBSERVER’s overall learning architecture and review OBSERVER’s operator learning algorithm. OBSERVER learns the preconditions of planning operators in a manner similar to the version spaces concept learning method (Mitchell 1978) (i.e. it learns a specific representation and a general representation for the preconditions of each operator). It learns the operator effects by generalizing the delta-state (the difference between the post-state and pre-state) from multiple observations. We then describe different types of domain knowledge imperfections and how they affect planning. This is followed by detailed descriptions of our approach for planning and plan repair with imperfect domain knowledge. When solving a problem using incomplete and incorrect operators, OBSERVER first generates an initial plan that achieves the preconditions in the general representation of each operator but does not require achieving preconditions in the specific representation of the operator. The planner repairs the plan upon each execution failure by using the specific representation to determine which additional preconditions to achieve in order to make the failed operator applicable. OBSERVER has been implemented on top of a nonlinear planner, PRODIGY4.0 (Carbonell et al. 1992; Veloso et al. 1995). We present empirical results to demonstrate the validity of our approach in a process planning domain (Gil 1991). We discuss the generality of our approaches and show that, since our algorithms for planning with incomplete domain knowledge and plan repair rely solely on the representation of the preconditions and effects of the operators, they are general and can be applied to all operator-based planners that use a STRIPS-like operator representation, such as STRIPS (Fikes & Nilsson 1971), TWEAK (Chapman 1987), SNLP (McAllester & Rosenblitt 1991), and UCPOP (Penberthy & Weld 1992). OBSERVER’s Learning Architecture OBSERVER is a system that learns planning operators by observing expert solution traces and that further refines the operators through practice by solving problems in the environment in a learning-by-doing paradigm (Anzai & Simon 1979). During observation, OBSERVER uses the knowledge that is naturally observable when experts solve problems, without the need of explicit instruction or interrogation. During practice, OBSERVER generates its own learning opportunities by solving practice problems. OBSERVER is given the following inputs for learning: (1) the description language for the domain (object types and predicates); (2) experts’ solution traces (i.e., action sequences) where each action consists of the name of the operator being executed, the state in which the action is executed (pre-state), and the state resulting from the action execution (post-state); (3) practice problems (i.e., initial state and goal descriptions) to allow learning-by-doing operator refinement. OBSERVER’s learning algorithms make several assumptions. First, since OBSERVER is operating within the framework of classical planners, it assumes that the operators and the states are deterministic. Second, OBSERVER assumes noise-free sensors, i.e., there are no errors in the states. And finally, we notice that in most application domains, the majority of the operators have conjunctive preconditions only. Therefore, OBSERVER assumes that the operators have conjunctive preconditions. This assumption greatly reduces the search space for operator preconditions without sacrificing much of the generality of learning approach. Figure 1 shows the architecture of our learning system OBSERVER. There are three main components: Learning operators from observation: OBSERVER inductively learns an initial set of operators incrementally by analyzing expert solutions. The initial knowledge OBSERVER starts with is kept to a minimum. Details of the learning algorithm are described in (Wang 1995). We will only briefly review the learning methods in the next section. Planning, plan repair, and execution: The initial set of operators learned from observation can be incomplete and incorrect in certain ways, and must be refined by analyzing the system’s own execution traces during practice. Given a practice problem, OBSERVER first generates an initial plan to solve the problem. The initial plan is then executed in the environment. Each plan step results in either a successful or unsuccessful execution of an operator. OBSERVER uses the results of these executions as positive or negative training examples for further operator refinement. Upon execution failures, the planner also repairs the failed plans and executes the repaired plans. This process repeats until the problem is solved, or until a resource bound is exceeded. This component, i.e., planning with incomplete operators and plan repair, is the focus of this paper. Refining operators during practice: The successful and unsuccessful executions generated during practice are effective trainingexamples that OBSERVER uses to further refine the initial imperfect operators. Again, the refinement method is described in detail in (Wang 1995). We briefly review the learning methods in the next section.
منابع مشابه
Learning probabilistic planning operators from noisy observations
Building agents which can learn to act autonomously in the world is an important challenge for artificial intelligence. While autonomous agents often have to operate in noisy, uncertain worlds, current methods to learn action models from agents’ experiences typically assume fully deterministic worlds. This paper presents a noise-tolerant approach to learning probabilistic planning operators fro...
متن کاملThe ROLLENT Planning and Learning System at the IPC-8 Learning Track
This paper describes the ROLLENT system submitted to the Eight International Planning Competition, Learning Track. ROLLENT combines two machine learning techniques: generation of entanglements and decision tree learning by ROLLER. Entanglements capture causal relationships for a class of problems while ROLLER learns relational decision trees useful to sort the applicable operators at a given st...
متن کاملOn Exploiting Structures of Classical Planning Problems: Generalizing Entanglements
Much progress has been made in the research and development of automated planning algorithms in recent years. Though incremental improvements in algorithm design are still desirable, complementary approaches such as problem reformulation are important in tackling the high computational complexity of planning. While machine learning and adaptive techniques have been usefully applied to automated...
متن کاملLearning by Observation and Practice: Towards Real Applications of Planning Systems
Acquiring knowledge from experts for planning systems is a difficult task, but is essential for any applications of planning systems. This work addresses the issue of automatic acquisition of planning operators. Operators are learned by observing the solution traces of experts agents and by subsequently refining knowledge in a learning-by-doing paradigm. Our approach is domain-independent and a...
متن کاملLearning Planning Operators by Observation and Practice
The work described in this paper addresses learning planning operators by observing expert agents and subsequent knowledge refinement in a learning-by-doing paradigm. The observations of the expert agent consist of: 1) the sequence of actions being executed, 2) the state in which each action is executed, and 3) the state resulting from the execution of each action. Planning operators are learne...
متن کامل